Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures

نویسندگان

  • Peng Du
  • Matthew Parsons
  • Erika Fuentes
  • Shih-Lung Shaw
  • Jack Dongarra
چکیده

This paper presents optimizations to Principal Component Analysis (PCA) in GRASS GIS. The current implementation of PCA in GRASS is based on eigenvalue decomposition, which does not have high memory requirements but it can suffer from low runtime performance. In modern computers, significant performance improvements can be achieved by appropriately taking advantage of the memory configuration (hierarchy). A common way of doing this is data reuse for frequent operations. The GRASS PCA is only reusing data at a very high level, i.e., at main memory and I/O level. We can improve the implementation of the PCA methodology by re-arranging the computations of matrix operations, and using available high performance packages that use block algorithms to optimize data reuse. We can achieve further optimizations by taking advantage of the now popular multi-core architectures and the use of Graphic Processing Units (GPU). By taking into account all these key supercomputing components, GRASS GIS can leverage massively parallel computing without requiring a supercomputer. The speed-up that can be achieved by using multi-core CPU and GPU architectures will greatly improve the efficiency of GRASS GIS for its users. We use imaging spectrometer data to demonstrate the performance improvements attained by our implementation, which reduced runtime by nearly 99% using only multi-core related optimizations and an additional 50% reduction using GPU related optimizations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety of hardware from mul...

متن کامل

Microsoft Word - GPU-based parallelization on principal components transformation

.. Techniques in band selection are usually used to select a subset of highly correlated data without losing their physical meaning for dimensionality reduction purpose. Among these techniques, principal components transformation (PCT) is the most commonly used in finding a new set of orthogonal bases (principal axes) that better captures spectral characteristics with the variance of transforme...

متن کامل

Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve four objectives: a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine...

متن کامل

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...

متن کامل

Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures

This paper presents a benchmarking, performance analysis and optimisation study of the OP2 “active” library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010